Week 7 - Discussion Questions

These are example discussion points for you to think about before class. You are not expected to engage with all of them — pick the ones that speak most directly to your own research, and bring two or three rough answers to the in-class session. The full description of how to use these pages, including what the question tags mean, is on the Week 1 Discussion page.

Sub-lessons

AI-Assisted Data Analysis in Practice

Calibrate The lesson surfaces the “silent error” problem — the case where AI produces a plausible-looking analysis that is quietly wrong. Pick a recent example from your own work or a colleague's. What would have caught it short of running the whole pipeline again from scratch?
Apply For a representative dataset from your field, draft a one-paragraph spec of what you would let AI clean and what you would still do by hand. Where is the line, and what evidence would move it?
Critical “Domain expertise is the essential complement” is a comforting line for researchers. Where is it actually true, and where is it a fig leaf for resisting tool adoption?
Connect The silent-error problem here is the data-analysis cousin of the hallucinated-citation problem in Week 5. Are they instances of the same failure mode, or genuinely different problems that happen to share a vibe?

Natural Language to Code

Calibrate Use a natural-language-to-code tool to generate a small analysis script in your area. Without running it, predict where it will be wrong. Then run it. How well did your prediction match the actual failure points?
Apply Map your own programming-confidence honestly: which of your research tasks are safe for natural-language-to-code (you can verify outputs without reading the code) and which are not safe (verification needs you to read the code line by line)?
Critical “Vibe coding” is acknowledged as both promise and peril. Is it a transient phenomenon that fades as researchers get more practice, or a stable new mode that we should design protocols around?
Connect The free-vs-paid quality gap discussed here is the same conversation as the free-vs-paid tier in Week 5 (literature tools) and Week 6 (writing tools). Are you consistently on the same side of these conversations across all three weeks, or are you paying for some categories and not others — and what does that pattern reveal about where you actually believe AI does meaningful work?

Verification of AI-Generated Code

Calibrate Read a piece of AI-generated code from a recent session. Identify three of the “common AI code failure patterns” from the lesson in it — if you find none, look harder. Which pattern is most dangerous in your field?
Apply Design a personal verification checklist for AI-generated analysis code you will actually run. Make it specific enough that another researcher could apply it to your work and reach roughly the same conclusion about which code is trustworthy.
Critical The lesson's claim is that verification matters more than generation. Steel-man the opposite case: are there research contexts where generation matters more, and verification is overhead that slows things down without proportional gain?
Connect Week 5's citation-verification exercises and Week 6's writing-audit techniques are both versions of the same underlying habit: read what the AI gives you as if a colleague had handed it to you. Compare how that habit plays out in citation checking, prose auditing, and code reading. Which is hardest to do well in practice, and what makes it hardest?

Visualisation with AI

Calibrate Ask an AI tool to make a publication-grade figure from a small dataset of yours. Read the output against the lesson's list of good-visualisation principles. Where does it violate them by default, and what would you have to do to get it right?
Apply Sketch the kinds of figure in your work where AI-generated visualisation is genuinely time-saving, and the kinds where it's actively misleading. What is the underlying property that separates them?
Critical Accessibility (colour-blind palettes, alt text, etc.) is treated as a separable concern. Is it — or are accessibility violations a symptom of a deeper problem with how AI visualisation defaults are chosen?
Connect Week 2's image-generation lesson covered how diffusion models produce plausible-looking outputs without any underlying ground truth. AI-generated scientific visualisations have a similar property: they look right because they are stylistically right, not because they are evidentially right. Compare the failure modes of decorative AI image generation with the failure modes of analytic AI visualisation. Where are they the same problem in different clothes, and where are they genuinely different?

Agentic Data Analysis

Calibrate Run a small agentic-data-analysis session (Claude Code, an equivalent agent) on a real but low-stakes question. Compare what the agent claimed to do with what the artifacts (code, logs, intermediate files) show it actually did. Where do they diverge?
Apply Draft a one-page CLAUDE.md for an agentic data-analysis session on your own data. Be specific about what context the agent needs, what tools it should and shouldn't use, and what counts as a successful outcome.
Critical Agentic analysis is presented as a paradigm shift. Is the shift real for your kind of analysis, or is the conversational loop still the right mode for what you do?
Connect Week 5 introduced Claude Code skills and Claude Projects as “agentic literature workflow.” Week 6 introduced an agentic writing workflow. This sub-lesson introduces agentic data analysis. Compare the three. What stays the same about your verification habits across all three, and what genuinely has to change?

Building Your Data Analysis Workflow

Calibrate The lesson's “principled workflow” expects you to keep AI in a specific role. Where on the workflow do you currently let AI overstep that role most often, and what would it take to stop?
Apply Pick a real research question and walk it through the principled workflow in detail (research question → prompts → verification → interpretation). What surprises you about where the work actually lives?
Critical When does the principled workflow become bureaucratic enough that a researcher will plausibly skip it? Where would you trim it without losing what matters?
Connect Week 2 covered the scaling-laws argument that bigger trained models give better outputs. The “AI vs established statistical packages” question here is the practical inversion: when does a small, well-understood, decades-old statistical tool beat a frontier-scale general model on a specific research task? Sketch the conditions under which each is the right call.

Hands-On Activities and Assessment

Calibrate Compare cohort results on Activity 1 (the Data Cleaning Challenge). Where the cleaned outputs differ, what assumptions did the AI — or you — quietly make? Be precise.
Apply Use the Interpretation Challenge to find one place where the AI's interpretation of a result was more defensible than your initial reading. What does that say about the kinds of interpretation worth delegating?
Critical The activities reward you for catching AI mistakes. Are you sure you would catch them with similar reliability when the stakes are real and the deadline is closer?
Connect The verification skills tested here are the direct descendants of Week 5's citation-verification exercises and Week 6's writing-audit exercises. Compare the three. Which skill transferred most cleanly from week to week, and which one had to be relearned almost from scratch?